2018-09-01
Data scientist @ funda
MSc in Applied Statistics
thatssorandom.com
@edwin_thoen
CRAN: padr, GGally, recipes
Who:
Who:
Who:
Who:
Who:
We don't have to use R when using R!
We don't have to use R when using R!
We can do
library(dplyr) mtcars <- mtcars %>% mutate(cyl_drat = cyl + drat)
or
mtcars_dt <- data.table::as.data.table(mtcars) mtcars_dt[, cyl_drat := cyl + drat]
Instead of
mtcars$cyl_drat <- mtcars$cyl + mtcars$drat
When you started using R, did you mix up?
install.packages("padr")
and
library(padr)
When you started using R, did you mix up?
install.packages("padr")
and
library(padr)
Or wondered why the library(padr) worked. Even when there is no variable callend padr?
Apparantly, things that ought not to work, are working.
This results in a language full of magic:
subset(mtcars, cyl == 6) ggplot2::ggplot(mtcars, aes(mpg, drat)) + geom_point() data.table::as.data.table(mtcars)[ ,mean(mpg), by = cyl]
R is designed to do data science. (Well, then it was still called statistics).
Flexibility to maiximize insight.
Enable DSL creation to tailor make tools to solve a specific problem without overhead.
With flexibility comes ambiguity and responsibility.
my_val <- 123
my_func <- function(x) {
x / 42 * 121
}
my_func(71)
## [1] 204.5476
my_func(my_val)
## [1] 354.3571
my_func(your_val)
## Error in my_func(your_val): object 'your_val' not found
By creating a variable we bind a value to a name.
my_val <- 123
123 is the value that is bound to the name my_val.
Binding happens in an environment, in this case the global.
my_val <- 123
123 is the value that is bound to the name my_val.
Binding happens in an environment, in this case the global.
Just call my name, I'll give you the value:
my_val
## [1] 123
This is evaluating the name.
R starts looking for the value of name in the environment the name is called in.
x <- "a variable in the global"
a_func <- function() {
x <- "a variable in the local"
x
}
a_func()
## [1] "a variable in the local"
When it can't find it locally, move up to the parent environment (where the current env was created).
z <- "a variable in the global"
another_func <- function() {
z
}
another_func()
## [1] "a variable in the global"
Finally, an error is thrown when the variable can't be found.
nobody_loves_me <- function() {
y
}
nobody_loves_me()
## Error in nobody_loves_me(): object 'y' not found
So this is standard evaluation in R.
We can also ask R to postpone judgement, by storing the variable name in a name object.
quote(my_unknown_var) %>% class()
## [1] "name"
We can also ask R to postpone judgement, by storing the variable name in a name object.
quote(my_unknown_var) %>% class()
## [1] "name"
This is the act of quoting, saving a variable name to be evaluated later.
(name is also called symbol)
Quoted variable names are not evaluated. It doesn't matter if they don't exist.
quoted_var <- quote(wait_for_it) quoted_var
## wait_for_it
quoted_var %>% class()
## [1] "name"
Quoted variable names are not evaluated. It doesn't matter if they don't exist.
quoted_var <- quote(wait_for_it) quoted_var
## wait_for_it
quoted_var %>% class()
## [1] "name"
We have two variables here:
The regular variable quoted_var contains the quoted variable wait_for_it.
It will start looking for the value only when we ask to evaluate it.
eval(quoted_var)
## Error in eval(quoted_var): object 'wait_for_it' not found
wait_for_it <- "I finally have a value" eval(quoted_var)
## [1] "I finally have a value"
We can evaluate the name in a different environment.
pulldiy_pull <- function(x, name) {
eval(name, envir = x)
}
diy_pull(mtcars, quote(cyl)) %>% head(5)
## [1] 6 6 4 6 8
pulldiy_pull <- function(x, name) {
eval(name, envir = x)
}
diy_pull(mtcars, quote(cyl)) %>% head(5)
## [1] 6 6 4 6 8
Note that we can specify a data frame as environment. The column names can be called as variables within it.
You'll never have to quote your function arguments when using a DSL.
mtcars %>% select(cyl) as.data.table(mtcars)[, cyl] ggplot(mtcars, aes(cyl)) + geom_bar()
Why does R not throw an error? There is no cyl in the global…
koala <- function(x, y) {
x + 42
}
koala(3)
## [1] 45
def koala(x, y): return(x + 42) koala(3)
## TypeError: koala() takes exactly 2 arguments (1 given) ## ## Detailed traceback: ## File "<string>", line 1, in <module>
So, R doesn't make a fuss until it really has to.
This allows quoting inside functions.
diy_pull_2 <- function(x, bare_name) {
name <- quote(bare_name)
eval(name, env = x)
}
So, R doesn't make a fuss until it really has to.
This allows quoting inside functions.
diy_pull_2 <- function(x, bare_name) {
name <- quote(bare_name)
eval(name, env = x)
}
diy_pull_2(mtcars, cyl)
## Error in eval(name, env = x): object 'cyl' not found
quote does literally quote the input, but we want to quote the value at the argument, not the arg's name.
Here we need substitute:
substitute_example <- function(x) {
substitute(x)
}
substitute_example(cyl)
## cyl
substitute_example(cyl) %>% class()
## [1] "name"
diy_pull_2 <- function(x, bare_name) {
name <- substitute(bare_name)
eval(name, env = x)
}
diy_pull_2(mtcars, cyl)
## [1] 6 6 4 6 8 6 8 4 4 6 6 8 8 8 8 8 8 4 4 4 4 8 8 8 8 4 4 4 8 6 8 4
We can quote the following things:
name: the name of an R object
call: calling of a function
pairlist: something from the past you shouldn't bother about
literal: evaluates to the value itself
Just like a name, a function call can be delayed by quoting.
my_little_filter <- function(x,
call) {
call_quoted <- substitute(call)
retain_row <- eval(call_quoted, envir = x)
x[retain_row, ]
}
my_little_filter(mtcars, cyl == 4 & gear == 4) %>% head(2)
## mpg cyl disp hp drat wt qsec vs am gear carb cyl_drat ## 3 22.8 4 108.0 93 3.85 2.32 18.61 1 1 4 1 7.85 ## 8 24.4 4 146.7 62 3.69 3.19 20.00 1 0 4 2 7.69
The value slot is empty at promise creation.
Only when the argument's expression is evaluated in the function, we start looking for it.
The value slot is empty at promise creation.
Only when the argument's expression is evaluated in the function, we start looking for it.
Remember koala?
koala <- function(x, y) {
x + 42
}
When we call koala we create the following promise
x_value <- 42 koala(x = x_value)
Now, that's why subsitute works!
Accesses the expression in the promise without evaluating it.
subs_func <- function(val) {
vals_expr <- substitute(val)
deparse(vals_expr)
}
subs_func(anything_goes)
## [1] "anything_goes"
Note that deparse coerces the expression to a character. Its inverse is parse.
The tidyverse NSE dialect.
mtcars %>% select(cyl)
We now know that cyl gets somehow quoted by select and evaluated within the data frame.
But what if we want to wrap tidyverse code in a custom function?
This won't work
my_tv_func <- function(x, grouping_var) {
x %>%
group_by(grouping_var) %>%
summarise(max_drat = max(drat))
}
my_tv_func(mtcars, cyl)
Why?
In order to get it to work:
In order to get it to work:
my_tv_func <- function(x, grouping_var) {
x %>%
group_by(!!grouping_var) %>%
summarise(max_drat = max(drat))
}
my_tv_func(mtcars, quo(cyl))
Just like using substitute you can quote the arguments value with enquo.
my_grouping_func <- function(x, grouping_var) {
grouping_var_q <- enquo(grouping_var)
x %>%
group_by(!!grouping_var_q) %>%
summarise(max_drat = max(drat))
}
my_grouping_func(mtcars, cyl)
## # A tibble: 3 x 2 ## cyl max_drat ## <dbl> <dbl> ## 1 4 4.93 ## 2 6 3.92 ## 3 8 4.22
my_correct_second_little_filter <- function(x, bare_call) {
call <- substitute(bare_call)
x[eval(call, envir = x), ]
}
my_correct_second_little_filter(mtcars, cyl == 4) %>% head(1)
## mpg cyl disp hp drat wt qsec vs am gear carb cyl_drat ## 3 22.8 4 108 93 3.85 2.32 18.61 1 1 4 1 7.85
cyl == 4 on itself is invalid, there is no cyl in the global.substitute gets the expression, which is the quoted call.x.cyl column.quasiquotation
quosures
environments
@edwin_thoen
github.com/EdwinTh/satRday
edwinth.github.io/blog/nse
edwinth.github.io/blog/dplyr-recipes